Dropped personal pronoun recovery in Chinese SMS

نویسندگان

  • Chris Giannella
  • Ransom K. Winder
  • Stacy Petersen
چکیده

In written Chinese, personal pronouns are commonly dropped when they can be inferred from context. This practice is particularly common in informal genres like Short Message Service (SMS) messages sent via cell phones. Restoring dropped personal pronouns can be a useful preprocessing step for information extraction. Dropped personal pronoun recovery can be divided into two subtasks: (1) detecting dropped personal pronoun slots and (2) determining the identity of the pronoun for each slot. We address a simpler version of restoring dropped personal pronouns wherein only the person numbers are identified. After applying a word segmenter, we used a linear5chain conditional random field (CRF) to predict which words were at the start of an independent clause. Then, using the independent clause start information, as well as lexical and syntactic information, we applied a CRF or a maximum5entropy classifier to predict whether a dropped personal pronoun immediately preceded each word and, if so, the person number of the dropped pronoun. We conducted a series of experiments using a manually annotated corpus of Chinese SMS messages. Our machine5learning–based approaches substantially outperformed a rule5based approach based partially on rules developed by Chung and Gildea in 2010. Features derived from parsing largely did not help our approaches. We conclude that the parse information is largely superfluous for identifying dropped personal pronouns if independent clause start information is available.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Neural Recovery Machine for Chinese Dropped Pronoun

Dropped pronouns (DPs) are ubiquitous in pro-drop languages like Chinese, Japanese etc. Previous work mainly focused on painstakingly exploring the empirical features for DPs recovery. In this paper, we propose a neural recovery machine (NRM) to model and recover DPs in Chinese, so that to avoid the non-trivial feature engineering process. The experimental results show that the proposed NRM sig...

متن کامل

Recovering dropped pronouns from Chinese text messages

Pronouns are frequently dropped in Chinese sentences, especially in informal data such as text messages. In this work we propose a solution to recover dropped pronouns in SMS data. We manually annotate dropped pronouns in 684 SMS files and apply machine learning algorithms to recover them, leveraging lexical, contextual and syntactic information as features. We believe this is the first work on...

متن کامل

An Empirical Study on Pronoun Resolution in Chinese

In this paper, we discuss how to identify three important features by our empirical observation gender and number features of antecedent as well as grammatical role of personal pronoun, which have no overt mark in Chinese. Only a tagger with extended POS set and some special word-lists are used. Finally, We describe an implemented prototypical system to resolve personal pronouns. Evaluation sho...

متن کامل

Chinese Zero Pronoun Resolution: Some Recent Advances

We extend Zhao and Ng's (2007) Chinese anaphoric zero pronoun resolver by (1) using a richer set of features and (2) exploiting the coreference links between zero pronouns during resolution. Results on OntoNotes show that our approach significantly outperforms two state-of-the-art anaphoric zero pronoun resolvers. To our knowledge, this is the first work to report results obtained by an end-toe...

متن کامل

Wh-drop and recoverability

One of the remarkable properties of Verb Second (V2) syntax is the fact that the constituent preceding the finite verb – let’s for simplicity call it ‘SpecCP’ – can under certain circumstances be dropped (cf. Ross 1982; Huang 1984; Fries 1988 for German, Ackema & Neeleman 2007 for Dutch). Pronoun drop is strictly confined to pronouns in SpecCP and can only occur in the root clause. In a context...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Natural Language Engineering

دوره 23  شماره 

صفحات  -

تاریخ انتشار 2017